Take-home Exercise 1: Geospatial Analytics for Public Good

Published

November 28, 2023

1 Introduction

1.1 Background

Buses and the Mass Rapid Transit (MRT) system are the main modes of transport used by Singaporeans for their daily commutes. With an ever increasing population, the key challenge for public bus operators is balancing supply against demand by optimising its services, fleets, and manpower for different bus routes.

Hence, the identification and analysis of the movement patterns of public buses can provide insights on commuting in Singapore, allowing for the development of better urban management strategies by both the public and private sectors.

1.2 The Study Area

The study area is Singapore, a city-state in Southeast Asia, with a total land area of about 730 square kilometres and a total population of 5.92 million (as at June 2023).

Commuting by public bus is the most common mode of transport in Singapore, with an average daily ridership of about 3.461 million in 2022, according to the LTA. The public bus fleet is approximately 5,800, plying more than 300 routes and 5,000 bus stops.

1.3 The Analytical Questions

In this take-home exercise, I aim to answer the following analytical questions:

  • Is the demand for public bus transport evenly distributed geographically?

  • If no, are there signs of spatial clustering?

  • If yes, where are these clusters, and what kind of clusters are they?

  • Based on the analysis, what are the insights that can be derived for urban transport planning?

1.4 Objectives and Tasks

Hence, the objectives of this take-home exercise are to:

  1. Apply Exploratory Spatial Data Analysis (ESDA) to obtain a preliminary understanding of public bus movements in Singapore; and

  2. Apply appropriate Local Indicators of Spatial Association (LISA) Analysis or Emerging Hot Spot Analysis (EHSA) to undercover the spatial and/or spatio-temporal mobility patterns of public bus passengers in Singapore.

The detailed tasks are:

  1. Geovisualisation and Analysis:

    • Compute passenger trips generated by origin at the hexagon level for:

      Peak Hour Period Bus Tap On Time
      Weekday Morning Peak 6am to 9am
      Weekday Afternoon Peak 5pm to 8pm
      Weekend/Holiday Morning Peak 11am to 2pm
      Weekend/Holiday Evening Peak 4pm to 7pm
    • Display the geographical distribution of the passenger trips using appropriate geovisualisation method.

    • Describe the spatial patterns revealed by the geovisualisation.

EITHER:

  1. Local Indicators of Spatial Association (LISA) Analysis:

    • Compute LISA of the passengers trips generated by origin at the hexagon level.

    • Display the LISA maps of the passengers trips generated by origin at the hexagon level - for those that are statistically significant.

    • Draw statistical conclusions from the analysis results.

OR:

  1. Emerging Hot Spot Analysis (EHSA) with reference to the passenger trips generated by origin at the hexagon level for the four time intervals explored in 1. above:

    • Perform Mann-Kendall Test using the spatio-temporal local Gi* values.

    • Prepare EHSA maps of the Gi* values of the passenger trips generated by origin at the hexagon level - for those that are statistically significant.

    • Describe the spatial patterns revealed with reference to the EHSA maps and data visualisation prepared.

Either 2. or 3. is required to be completed. I have attempted to complete both to obtain a more comprehensive understanding of the patterns in the data.

2 Getting Started

2.1 Setting the Analytical Tools

The R packages used in this take-home exercise are:

  • tidyverse (i.e. readr, tidyr, dplyr) for performing data science tasks such as importing, tidying, and wrangling data;

  • sf for importing, managing, and processing geospatial data;

  • sfdep for analysing spatial dependence and spatial relationships in data (building on spdep);

  • tmap for thematic mapping;

  • mapview for interactive viewing of spatial objects;

  • plotly for making interactive plots; and

  • knitr for embedding R code in different document formats (e.g., HTML) to facilitate dynamic report generation.

They are loaded into the R environment using the following code:

pacman::p_load(tidyverse, sf, sfdep, tmap, mapview, plotly, knitr)

2.2 Data Sources

The Land Transport Authority (LTA) studies commuter movements using the data collected from the use of smart cards and the Global Positioning System (GPS) devices on public buses. The LTA DataMall shares some of these data publicly, which helps to speed up the development of practical solutions to enhance reliability and efficiency of the transport system by other stakeholders, such as the private sector and individuals.

The data sets used in this take-home exercise are:

  1. Passenger Volume by Origin Destination Bus Stops from the LTA DataMall for the period of August 2023 to October 2023. There are three aspatial data sets (one for each month) in csv format. It contains data on the number of trips by weekdays and weekends from origin to destination bus stops.

  2. Bus Stop Location from the LTA DataMall. This is a geospatial data set in ESRI shapefile format. It contains a point representation that indicates the position of each bus stop where buses stop to pick up or drop off passengers.

The data sets are placed under two sub-folders:

  • geospatial (Bus Stop Location), and

  • aspatial (Passenger Volume by Origin Destination Bus Stops).

These two sub-folders are within the data folder of my In-class_Ex1 folder.

3 Data Wrangling

3.1 Importing and Merging the Aspatial Data - Passenger Volume

The three csv files are imported and combined into a tibble data frame, odbus, using functions from the base package as shown in the following codes:

  1. Generate list of file names of csv files using the list.files() function in the base package.
filenames = list.files(path="data/aspatial/", pattern="*.csv")

filenames
[1] "origin_destination_bus_202308.csv" "origin_destination_bus_202309.csv"
[3] "origin_destination_bus_202310.csv"
  1. Generate path to csv files using the file.path() function in the base package.
path = file.path("data/aspatial", filenames)

path
[1] "data/aspatial/origin_destination_bus_202308.csv"
[2] "data/aspatial/origin_destination_bus_202309.csv"
[3] "data/aspatial/origin_destination_bus_202310.csv"
  1. Merge the three csv files using the do.call() and lapply() functions in the base package and the read.csv() function from the readr package.
odbus = do.call("rbind", lapply(path, FUN=function(files){ read.csv(files)}))
  1. Check using the unique() function in the base package to confirm that the data sets for each of the three months have been incorporated. [Note: Although the three data sets have been merged into a single data frame, the observations would be analysed month by month, instead of aggregating all observations.]
unique(odbus$YEAR_MONTH)
[1] "2023-08" "2023-09" "2023-10"

Once the combined data set has been obtained, the n_distinct() function in the dplyr package and the sum() function in the base package are applied to uncover the following observations regarding odbus:

  • Overall, there are 17,118,005 rows (observations) and 7 columns. 5,709,512, 5,714,196, and 5,694,297 rows (observations) from August, September, and October 2023 respectively, reflecting a generally stable total ridership over the three-month period.

  • The “YEAR_MONTH” column has three unique values, reflecting the observations from August, September, and October 2023.

  • The “DAY_TYPE” column has two unique values, reflecting observations from weekday or weekends/holiday.

  • The “TIME_PER_HOUR” has 24 unique values, reflecting that the observations are broken down into each of the 24 hours of a day.

  • The “PT_TYPE” column refers only has the value of “BUS” as the public transport type. Hence, it may be dropped.

  • The “ORIGIN_PT_CODE” and “DESTINATION_PT_CODE” columns have 5,075 and 5,079 unique values respectively, reflecting the number of bus stops with bus routes passing through them.

  • The “TOTAL_TRIPS” column contains values that reflect the number of passengers for each unique month, type of day, origin bus stop, destination bus stop. The minimum value is 1, i.e., there are no rows with zero values.

sapply(odbus, function(x) n_distinct(x))
         YEAR_MONTH            DAY_TYPE       TIME_PER_HOUR             PT_TYPE 
                  3                   2                  24                   1 
     ORIGIN_PT_CODE DESTINATION_PT_CODE         TOTAL_TRIPS 
               5075                5079                4854 
sum(odbus$YEAR_MONTH == "2023-08")
[1] 5709512
sum(odbus$YEAR_MONTH == "2023-09")
[1] 5714196
sum(odbus$YEAR_MONTH == "2023-10")
[1] 5694297

3.2 Importing and Transforming the Geospatial Data - Bus Stop Location

The Bus Stop Location shapefile is imported using st_read() in the sf package. The output is a simple feature data frame, busstop, which is in the SVY21 projected coordinates systems. It has 5,161 features and 3 fields, which include the geometry points indicating the bus stop locations.

busstop = st_read(dsn = "data/geospatial",
                  layer = "BusStop")
Reading layer `BusStop' from data source 
  `C:\jmphosis\ISSS624\Take-home_Ex\Take-home_Ex1\data\geospatial' 
  using driver `ESRI Shapefile'
Simple feature collection with 5161 features and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 3970.122 ymin: 26482.1 xmax: 48284.56 ymax: 52983.82
Projected CRS: SVY21
glimpse(busstop)
Rows: 5,161
Columns: 4
$ BUS_STOP_N <chr> "22069", "32071", "44331", "96081", "11561", "66191", "2338…
$ BUS_ROOF_N <chr> "B06", "B23", "B01", "B05", "B05", "B03", "B02A", "B02", "B…
$ LOC_DESC   <chr> "OPP CEVA LOGISTICS", "AFT TRACK 13", "BLK 239", "GRACE IND…
$ geometry   <POINT [m]> POINT (13576.31 32883.65), POINT (13228.59 44206.38),…

The st_crs() function in the sf package is then used to check the coordinate system of the busstop simple feature data frame. The output shows that although it is projected in SVY21, the EPSG is indicated as 9001, which is incorrect given that it should be 3414 instead.

st_crs(busstop)
Coordinate Reference System:
  User input: SVY21 
  wkt:
PROJCRS["SVY21",
    BASEGEOGCRS["WGS 84",
        DATUM["World Geodetic System 1984",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]],
            ID["EPSG",6326]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["Degree",0.0174532925199433]]],
    CONVERSION["unnamed",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["Degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["(E)",east,
            ORDER[1],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]],
        AXIS["(N)",north,
            ORDER[2],
            LENGTHUNIT["metre",1,
                ID["EPSG",9001]]]]

To correct the EPSG, the st_set_crs() function in the sf package is applied. A check to confirm that the projection transformation has been applied is then made using the st_crs() function again.

busstop = st_set_crs(busstop, 3414)
st_crs(busstop)
Coordinate Reference System:
  User input: EPSG:3414 
  wkt:
PROJCRS["SVY21 / Singapore TM",
    BASEGEOGCRS["SVY21",
        DATUM["SVY21",
            ELLIPSOID["WGS 84",6378137,298.257223563,
                LENGTHUNIT["metre",1]]],
        PRIMEM["Greenwich",0,
            ANGLEUNIT["degree",0.0174532925199433]],
        ID["EPSG",4757]],
    CONVERSION["Singapore Transverse Mercator",
        METHOD["Transverse Mercator",
            ID["EPSG",9807]],
        PARAMETER["Latitude of natural origin",1.36666666666667,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8801]],
        PARAMETER["Longitude of natural origin",103.833333333333,
            ANGLEUNIT["degree",0.0174532925199433],
            ID["EPSG",8802]],
        PARAMETER["Scale factor at natural origin",1,
            SCALEUNIT["unity",1],
            ID["EPSG",8805]],
        PARAMETER["False easting",28001.642,
            LENGTHUNIT["metre",1],
            ID["EPSG",8806]],
        PARAMETER["False northing",38744.572,
            LENGTHUNIT["metre",1],
            ID["EPSG",8807]]],
    CS[Cartesian,2],
        AXIS["northing (N)",north,
            ORDER[1],
            LENGTHUNIT["metre",1]],
        AXIS["easting (E)",east,
            ORDER[2],
            LENGTHUNIT["metre",1]],
    USAGE[
        SCOPE["Cadastre, engineering survey, topographic mapping."],
        AREA["Singapore - onshore and offshore."],
        BBOX[1.13,103.59,1.47,104.07]],
    ID["EPSG",3414]]

3.3 Checking for Duplicates and Missing Values

The data sets from the LTA DataMall are expected to be relatively clean. Nevertheless, due diligence checks for duplicates and missing values are still made to confirm the assumptions.

  • The duplicated() function in the base package is used to check for duplicates in odbus. There are no duplicates in the tibble data frame.
odbus[duplicated(odbus), ]
[1] YEAR_MONTH          DAY_TYPE            TIME_PER_HOUR      
[4] PT_TYPE             ORIGIN_PT_CODE      DESTINATION_PT_CODE
[7] TOTAL_TRIPS        
<0 rows> (or 0-length row.names)
  • The duplicated() function in the base package and st_geometry() function in the sf package are used to check for duplicates in busstop. The output returned Bus Stop No. 96319 at row 3265. On closer inspection of the simple feature data frame, rows 3264 and 3265 are found to be duplicates. Hence, the row 3264 is removed using the row_number() function in the dplyr package. The same check is conducted again to confirm that there are no more duplicates.
busstop[duplicated(st_geometry(busstop)), ]
Simple feature collection with 1 feature and 3 fields
Geometry type: POINT
Dimension:     XY
Bounding box:  xmin: 42187.23 ymin: 34995.78 xmax: 42187.23 ymax: 34995.78
Projected CRS: SVY21 / Singapore TM
     BUS_STOP_N BUS_ROOF_N        LOC_DESC                  geometry
3265      96319        NIL YUSEN LOGISTICS POINT (42187.23 34995.78)
busstop = busstop %>%
  filter(row_number() != 3264)

busstop[duplicated(st_geometry(busstop)), ]
Simple feature collection with 0 features and 3 fields
Bounding box:  xmin: NA ymin: NA xmax: NA ymax: NA
Projected CRS: SVY21 / Singapore TM
[1] BUS_STOP_N BUS_ROOF_N LOC_DESC   geometry  
<0 rows> (or 0-length row.names)
  • The is.na() and sum() functions in the base package and the summarise_all() function in the dplyr package are used to check for missing values in odbus. There are no missing values in the tibble data frame.
odbus %>% 
  summarise_all(~ sum(is.na(.)))
  YEAR_MONTH DAY_TYPE TIME_PER_HOUR PT_TYPE ORIGIN_PT_CODE DESTINATION_PT_CODE
1          0        0             0       0              0                   0
  TOTAL_TRIPS
1           0
  • The is.na() and sum() functions in the base package are used to check for missing values in busstop. There are no missing values in the simple feature data frame.
sum(is.na(st_geometry(busstop)))
[1] 0

3.4 Aggregating and Selecting Columns

Given that the analysis focuses on passenger volume by origin bus stop, the group_by() and summarise() functions in the dplyr package are used to sum the “TOTAL_TRIPS” values (i.e., total passengers) by “ORIGIN_PT_CODE”, “YEAR_MONTH”, “DAY_TYPE”, and “TIME_PER_HOUR” in odbus. At the same time, the columns “DESTINATION_PT_CODE” and “PT_TYPE” are dropped as they are not required. The number of rows (observations) are condensed from 17,118,005 into 571,696.

odbus = odbus %>%
  group_by(ORIGIN_PT_CODE, YEAR_MONTH, DAY_TYPE, TIME_PER_HOUR) %>%
  summarise(TOTAL_TRIPS = sum(TOTAL_TRIPS))

The select() function in dplyr package is used to drop the “BUS_ROOF_N” column in busstop since it is not required.

busstop = busstop %>%
  select(-BUS_ROOF_N)

3.5 Performing Relational Join

The simple feature data frame, busstop, and the tibble data frame, odbus, are combined using the inner_join() function in the dplyr package, by matching the “BUS_STOP_N” column in busstop with the “ORIGIN_PT_CODE” in odbus. However, as the “BUS_STOP_N” values are character values, they would need to be converted to integer values first before the join, so that values such as “01012” in busstop can be matched to values such as “1012” in odbus.

Upon joining the two data frames, the output is the simple feature data frame (since busstop was placed as the first argument in the inner_join() function), bus. The number of rows (observations) was reduced from 571,696 to 567,725, i.e., 3,841 rows are dropped. This is likely because some new bus stops such as Bus Stop No. 3549 (Shenton Way Stn Exit 3) and Bus Stop No. 96461 (Bef Aviation Pk Rd) were not included in the Bus Stop Location shapefile that was last updated in July 2023. This means that the analysis in this take-home exercise is limited in this aspect.

busstop = busstop %>%
  mutate(BUS_STOP_N = as.integer(BUS_STOP_N))

bus = inner_join(busstop, odbus, 
                by = c("BUS_STOP_N" = "ORIGIN_PT_CODE"))

3.6 Creating Spatial Hexagon Grid

Spatial grids are commonly used in spatial analysis. Regularly shaped grids may comprise of equilateral triangles, squares, or hexagons. The hexagon is the most circular-shaped polygon that can tessellate to form an evenly spaced grid, providing a low perimeter-to-area ratio that reduces sampling bias due to edge effects of the grid shape.

A more circular polygon means that points near the border are closer to the centroid. Hexagons are often used when the analysis involves aspects of connectivity or movement paths. Locating neighbours is simpler using a hexagon grid because the contact edge or length is consistent on each side, resulting in equidistant centroids for each neighbour.

A spatial hexagon grid, hgrid, for the study area of Singapore is created using the following codes:

area_hexagon_grid = st_make_grid(bus, c(500, 500), what = "polygons", square = FALSE)

hgrid = st_sf(area_hexagon_grid) %>%
  mutate(grid_id = 1:length(lengths(area_hexagon_grid)))

4 Exploratory Data Analysis - Computing, Visualising, and Deriving Insights on Passenger Trips by Origin

The same steps for 4.1 are repeated for the three other time periods studied in 4.2, 4.3, and 4.4.

4.1 Weekday Morning Peak (6am to 9.59am)

The same steps for 4.1.1 are repeated for the two other months studied in 4.1.2, and 4.1.3.

4.1.1 August 2023

The filter() function in the dplyr package is used to obtain a subset of the simple feature data frame, wdAM_Aug, for the passenger trips by origin during weekday morning peak periods in August 2023.

wdAMAug = bus %>%
  filter(YEAR_MONTH == "2023-08") %>%
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 6 & TIME_PER_HOUR <= 9)

The st_join() function in the sf package and the drop.na() function in the tidyr package are used to match the bus stops points with the hexagons in the grid. The group_by() and summarise() functions in the dplyr package are used to sum the “TOTAL_TRIPS” values (i.e., total passengers) for each hexagon in the grid.

wdAMAug_ct = st_join(hgrid, wdAMAug) %>%
  drop_na() %>%
  group_by(grid_id) %>%
  summarise(TOTAL_TRIPS = sum(TOTAL_TRIPS))

The output is then plotted use functions in the tmap package. To facilitate comparisons between different peak periods, the breaks are set at fixed intervals of 50,000 passengers.

brk = c(1, 50000, 100000, 150000, 200000, 250000, 300000, 350000)
tmap_mode("view")

map_wdAMAug = tm_shape(wdAMAug_ct) +
  tm_fill(col = "TOTAL_TRIPS",
    palette = "Reds",
    breaks = brk,
    title = "No. of Passenger Trips",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.5,
    popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
    popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_layout(title = "Weekday Morning Peak in Aug 2023")

map_wdAMAug

4.1.2 September 2023

The filter() function in the dplyr package is used to obtain a subset of the simple feature data frame, wdAM_Sep, for the passenger trips by origin during weekday morning peak periods in September 2023.

wdAMSep = bus %>%
  filter(YEAR_MONTH == "2023-09") %>%
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 6 & TIME_PER_HOUR <= 8)

The st_join() function in the sf package and the drop.na() function in the tidyr package are used to match the bus stops points with the hexagons in the grid. The group_by() and summarise() functions in the dplyr package are used to sum the “TOTAL_TRIPS” values (i.e., total passengers) for each hexagon in the grid.

wdAMSep_ct = st_join(hgrid, wdAMSep) %>%
  drop_na() %>%
  group_by(grid_id) %>%
  summarise(TOTAL_TRIPS = sum(TOTAL_TRIPS))

The output is then plotted use functions in the tmap package.

tmap_mode("view")

map_wdAMSep = tm_shape(wdAMSep_ct) +
  tm_fill(col = "TOTAL_TRIPS",
    palette = "Reds",
    breaks = brk,
    title = "No. of Passenger Trips",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.5,
    popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
    popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_layout(title = "Weekday Morning Peak in Sep 2023")

map_wdAMSep

4.1.3 October 2023

The filter() function in the dplyr package is used to obtain a subset of the simple feature data frame, wdAM_Oct, for the passenger trips by origin during weekday morning peak periods in October 2023.

wdAMOct = bus %>%
  filter(YEAR_MONTH == "2023-10") %>%
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 6 & TIME_PER_HOUR <= 8)

The st_join() function in the sf package and the drop.na() function in the tidyr package are used to match the bus stops points with the hexagons in the grid. The group_by() and summarise() functions in the dplyr package are used to sum the “TOTAL_TRIPS” values (i.e., total passengers) for each hexagon in the grid.

wdAMOct_ct = st_join(hgrid, wdAMOct) %>%
  drop_na() %>%
  group_by(grid_id) %>%
  summarise(TOTAL_TRIPS = sum(TOTAL_TRIPS))

The output is then plotted use functions in the tmap package.

tmap_mode("view")

map_wdAMOct = tm_shape(wdAMOct_ct) +
  tm_fill(col = "TOTAL_TRIPS",
    palette = "Reds",
    breaks = brk,
    title = "No. of Passenger Trips",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.5,
    popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
    popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_layout(title = "Weekday Morning Peak in Oct 2023")

map_wdAMOct

4.1.4 Insights

xxx

4.2 Weekday Afternoon Peak (5pm to 7.59pm)

The same steps for 4.2.1 are repeated for the two other months studied in 4.2.2, and 4.2.3.

4.2.1 August 2023

The filter() function in the dplyr package is used to obtain a subset of the simple feature data frame, wdPM_Aug, for the passenger trips by origin during weekday afternoon peak periods in August 2023.

wdPMAug = bus %>%
  filter(YEAR_MONTH == "2023-08") %>% 
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 17 & TIME_PER_HOUR <= 19)

The st_join() function in the sf package and the drop.na() function in the tidyr package are used to match the bus stops points with the hexagons in the grid. The group_by() and summarise() functions in the dplyr package are used to sum the “TOTAL_TRIPS” values (i.e., total passengers) for each hexagon in the grid.

wdPMAug_ct = st_join(hgrid, wdPMAug) %>%
  drop_na() %>%
  group_by(grid_id) %>%
  summarise(TOTAL_TRIPS = sum(TOTAL_TRIPS))

The output is then plotted use functions in the tmap package.

tmap_mode("view")

map_wdPMAug = tm_shape(wdPMAug_ct) +
  tm_fill(col = "TOTAL_TRIPS",
    palette = "Reds",
    breaks = brk,
    title = "No. of Passenger Trips",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.5,
    popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
    popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_layout(title = "Weekday Afternoon Peak in Aug 2023")

map_wdPMAug

4.2.2 September 2023

The filter() function in the dplyr package is used to obtain a subset of the simple feature data frame, wdPM_Sep, for the passenger trips by origin during weekday afternoon peak periods in September 2023.

wdPMSep = bus %>%
  filter(YEAR_MONTH == "2023-09") %>%
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 17 & TIME_PER_HOUR <= 19)

The st_join() function in the sf package and the drop.na() function in the tidyr package are used to match the bus stops points with the hexagons in the grid. The group_by() and summarise() functions in the dplyr package are used to sum the “TOTAL_TRIPS” values (i.e., total passengers) for each hexagon in the grid.

wdPMSep_ct = st_join(hgrid, wdPMSep) %>%
  drop_na() %>%
  group_by(grid_id) %>%
  summarise(TOTAL_TRIPS = sum(TOTAL_TRIPS))

The output is then plotted use functions in the tmap package.

tmap_mode("view")

map_wdPMSep = tm_shape(wdPMSep_ct) +
  tm_fill(col = "TOTAL_TRIPS",
    palette = "Reds",
    breaks = brk,
    title = "No. of Passenger Trips",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.5,
    popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
    popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_layout(title = "Weekday Afternoon Peak in Sep 2023")

map_wdPMSep

4.2.3 October 2023

The filter() function in the dplyr package is used to obtain a subset of the simple feature data frame, wdPM_Oct, for the passenger trips by origin during weekday afternoon peak periods in October 2023.

wdPMOct = bus %>%
  filter(YEAR_MONTH == "2023-10") %>%
  filter(DAY_TYPE == "WEEKDAY") %>%
  filter(TIME_PER_HOUR >= 17 & TIME_PER_HOUR <= 19)

The st_join() function in the sf package and the drop.na() function in the tidyr package are used to match the bus stops points with the hexagons in the grid. The group_by() and summarise() functions in the dplyr package are used to sum the “TOTAL_TRIPS” values (i.e., total passengers) for each hexagon in the grid.

wdPMOct_ct = st_join(hgrid, wdPMOct) %>%
  drop_na() %>%
  group_by(grid_id) %>%
  summarise(TOTAL_TRIPS = sum(TOTAL_TRIPS))

The output is then plotted use functions in the tmap package.

tmap_mode("view")

map_wdPMOct = tm_shape(wdPMOct_ct) +
  tm_fill(col = "TOTAL_TRIPS",
    palette = "Reds",
    breaks = brk,
    title = "No. of Passenger Trips",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.5,
    popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
    popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_layout(title = "Weekday Afternoon Peak in Oct 2023")

map_wdPMOct

4.2.4 Insights

xxx

4.3 Weekend/Holiday Morning Peak (11am to 1.59pm)

The same steps for 4.3.1 are repeated for the two other months studied in 4.3.2, and 4.3.3.

4.3.1 August 2023

The filter() function in the dplyr package is used to obtain a subset of the simple feature data frame, weAM_Aug, for the passenger trips by origin during weekend/holiday morning peak periods in August 2023.

weAMAug = bus %>%
  filter(YEAR_MONTH == "2023-08") %>%
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
  filter(TIME_PER_HOUR >= 11 & TIME_PER_HOUR <= 13)

The st_join() function in the sf package and the drop.na() function in the tidyr package are used to match the bus stops points with the hexagons in the grid. The group_by() and summarise() functions in the dplyr package are used to sum the “TOTAL_TRIPS” values (i.e., total passengers) for each hexagon in the grid.

weAMAug_ct = st_join(hgrid, weAMAug) %>%
  drop_na() %>%
  group_by(grid_id) %>%
  summarise(TOTAL_TRIPS = sum(TOTAL_TRIPS))

The output is then plotted use functions in the tmap package.

tmap_mode("view")

map_weAMAug = tm_shape(weAMAug_ct) +
  tm_fill(col = "TOTAL_TRIPS",
    palette = "Reds",
    breaks = brk,
    title = "No. of Passenger Trips",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.5,
    popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
    popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_layout(title = "Weekend/Holiday Morning Peak in Aug 2023")

map_weAMAug

4.3.2 September 2023

The filter() function in the dplyr package is used to obtain a subset of the simple feature data frame, weAM_Sep, for the passenger trips by origin during weekend/holiday morning peak periods in September 2023.

weAMSep = bus %>%
  filter(YEAR_MONTH == "2023-09") %>% 
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
  filter(TIME_PER_HOUR >= 11 & TIME_PER_HOUR <= 13)

The st_join() function in the sf package and the drop.na() function in the tidyr package are used to match the bus stops points with the hexagons in the grid. The group_by() and summarise() functions in the dplyr package are used to sum the “TOTAL_TRIPS” values (i.e., total passengers) for each hexagon in the grid.

weAMSep_ct = st_join(hgrid, weAMSep) %>%
  drop_na() %>%
  group_by(grid_id) %>%
  summarise(TOTAL_TRIPS = sum(TOTAL_TRIPS))

The output is then plotted use functions in the tmap package.

tmap_mode("view")

map_weAMSep = tm_shape(weAMSep_ct) +
  tm_fill(col = "TOTAL_TRIPS",
    palette = "Reds",
    breaks = brk,
    title = "No. of Passenger Trips",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.5,
    popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
    popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_layout(title = "Weekend/Holiday Morning Peak in Sep 2023")

map_weAMSep

4.3.3 October 2023

The filter() function in the dplyr package is used to obtain a subset of the simple feature data frame, weAM_Oct, for the passenger trips by origin during weekend/holiday morning peak periods in October 2023.

weAMOct = bus %>%
  filter(YEAR_MONTH == "2023-10") %>%
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
  filter(TIME_PER_HOUR >= 11 & TIME_PER_HOUR <= 13)

The st_join() function in the sf package and the drop.na() function in the tidyr package are used to match the bus stops points with the hexagons in the grid. The group_by() and summarise() functions in the dplyr package are used to sum the “TOTAL_TRIPS” values (i.e., total passengers) for each hexagon in the grid.

weAMOct_ct = st_join(hgrid, weAMOct) %>%
  drop_na() %>%
  group_by(grid_id) %>%
  summarise(TOTAL_TRIPS = sum(TOTAL_TRIPS))

The output is then plotted use functions in the tmap package.

tmap_mode("view")

map_weAMOct = tm_shape(weAMOct_ct) +
  tm_fill(col = "TOTAL_TRIPS",
    palette = "Reds",
    breaks = brk,
    title = "No. of Passenger Trips",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.5,
    popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
    popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_layout(title = "Weekend/Holiday Morning Peak in Oct 2023")

map_weAMOct

4.3.4 Insights

xxx

4.4 Weekend/Holiday Evening Peak (4pm to 6.59pm)

The same steps for 4.4.1 are repeated for the two other months studied in 4.4.2, and 4.4.3.

4.4.1 August 2023

The filter() function in the dplyr package is used to obtain a subset of the simple feature data frame, wePM_Aug, for the passenger trips by origin during weekend/holiday evening peak periods in August 2023.

wePMAug = bus %>%
  filter(YEAR_MONTH == "2023-08") %>%
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
  filter(TIME_PER_HOUR >= 16 & TIME_PER_HOUR <= 18)

The st_join() function in the sf package and the drop.na() function in the tidyr package are used to match the bus stops points with the hexagons in the grid. The group_by() and summarise() functions in the dplyr package are used to sum the “TOTAL_TRIPS” values (i.e., total passengers) for each hexagon in the grid.

wePMAug_ct = st_join(hgrid, wePMAug) %>%
  drop_na() %>%
  group_by(grid_id) %>%
  summarise(TOTAL_TRIPS = sum(TOTAL_TRIPS))

The output is then plotted use functions in the tmap package.

tmap_mode("view")

map_wePMAug = tm_shape(wePMAug_ct) +
  tm_fill(col = "TOTAL_TRIPS",
    palette = "Reds",
    breaks = brk,
    title = "No. of Passenger Trips",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.5,
    popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
    popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_layout(title = "Weekend/Holiday Evening Peak in Aug 2023")

map_wePMAug

4.4.2 September 2023

The filter() function in the dplyr package is used to obtain a subset of the simple feature data frame, wePM_Sep, for the passenger trips by origin during weekend/holiday evening peak periods in September 2023.

wePMSep = bus %>%
  filter(YEAR_MONTH == "2023-09") %>%
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
  filter(TIME_PER_HOUR >= 16 & TIME_PER_HOUR <= 18)

The st_join() function in the sf package and the drop.na() function in the tidyr package are used to match the bus stops points with the hexagons in the grid. The group_by() and summarise() functions in the dplyr package are used to sum the “TOTAL_TRIPS” values (i.e., total passengers) for each hexagon in the grid.

wePMSep_ct = st_join(hgrid, wePMSep) %>%
  drop_na() %>%
  group_by(grid_id) %>%
  summarise(TOTAL_TRIPS = sum(TOTAL_TRIPS))

The output is then plotted use functions in the tmap package.

tmap_mode("view")

map_wePMSep = tm_shape(wePMSep_ct) +
  tm_fill(col = "TOTAL_TRIPS",
    palette = "Reds",
    breaks = brk,
    title = "No. of Passenger Trips",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.5,
    popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
    popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_layout(title = "Weekend/Holiday Evening Peak in Sep 2023")

map_wePMSep

4.4.3 October 2023

The filter() function in the dplyr package is used to obtain a subset of the simple feature data frame, wePM_Oct, for the passenger trips by origin during weekend/holiday evening peak periods in October 2023.

wePMOct = bus %>%
  filter(YEAR_MONTH == "2023-10") %>%
  filter(DAY_TYPE == "WEEKENDS/HOLIDAY") %>%
  filter(TIME_PER_HOUR >= 16 & TIME_PER_HOUR <= 18)

The st_join() function in the sf package and the drop.na() function in the tidyr package are used to match the bus stops points with the hexagons in the grid. The group_by() and summarise() functions in the dplyr package are used to sum the “TOTAL_TRIPS” values (i.e., total passengers) for each hexagon in the grid.

wePMOct_ct = st_join(hgrid, wePMOct) %>%
  drop_na() %>%
  group_by(grid_id) %>%
  summarise(TOTAL_TRIPS = sum(TOTAL_TRIPS))

The output is then plotted use functions in the tmap package.

tmap_mode("view")

map_wePMOct = tm_shape(wePMOct_ct) +
  tm_fill(col = "TOTAL_TRIPS",
    palette = "Reds",
    breaks = brk,
    title = "No. of Passenger Trips",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.5,
    popup.vars = c("No. of Passenger Trips: " = "TOTAL_TRIPS"),
    popup.format = list(TOTAL_TRIPS = list(format = "f", digits = 0))) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_layout(title = "Weekend/Holiday Evening Peak in Oct 2023")

map_wePMOct

4.4.4 Insights

xxx

4.5 Comparison Between Different Peak Periods

4.5.1 August 2023

Weekday AM versus Weekday PM:

tmap_arrange(map_wdAMAug, map_wdPMAug, ncol = 2)

Weekend/Holiday AM versus Weekend/Holiday PM:

tmap_arrange(map_weAMAug, map_wePMAug, ncol = 2)

4.5.2 September 2023

Weekday AM versus Weekday PM:

tmap_arrange(map_wdAMSep, map_wdPMSep, ncol = 2)

Weekend/Holiday AM versus Weekend/Holiday PM:

tmap_arrange(map_weAMSep, map_wePMSep, ncol = 2)

4.5.3 October 2023

Weekday AM versus Weekday PM:

tmap_arrange(map_wdAMOct, map_wdPMOct, ncol = 2)

Weekend/Holiday AM versus Weekend/Holiday PM:

tmap_arrange(map_weAMOct, map_wePMOct, ncol = 2)

4.5.4 Insights

xxx

5 Local Indicators of Spatial Association (LISA) Analysis

Local Indicators of Spatial Association (LISA) are statistics that evaluate the existence of clusters in the spatial arrangement of a given variable. The appropriate LISA, including local Moran’s I, are applied to detect cluster(s) and/or outlier(s) for the Bus Passenger Trips in the bus simple feature data frame, month by month.

5.1 Computing Distance-based Weights

Distance-based weights are used in this take-home exercise. Distance-based weights are more appropriate than contiguity spatial weights in this context given that the hexagons are spread out across Singapore, and some hexagons may not have immediate neighbours (i.e., contiguous) due to reasons such as being separated by reservoirs or waterways.

There are three popularly used distance-based spatial weights, they are: fixed distance weights, adaptive distance weights, and inverse distance weights (IDW). For this take-home exercise, a combination of IDW and adaptive distance weights are used. IDW assigns higher weights to neighbours that are closer and lower weights to neighbours that are further away, while adaptive distance weights balances out the number of neighbours for each hexagon regardless of how densely populated its surrounding areas is (i.e., density of nearby bus stops).

  • The st_knn() function in the sfdep package is used to identify neighbors based on k (i.e. k = 8 indicates the nearest eight neighbours). The output is a list of neighbours (i.e. nb).

  • The st_inverse_distance() function in the sfdep package is then used to calculate inverse distance weights of neighbours in the nb.

For Weekday Morning Peak:

wdAMAug_wt = wdAMAug_ct %>% 
  mutate(nb = st_knn(area_hexagon_grid,
                     k=8),
         wt = st_inverse_distance(nb, 
                                   area_hexagon_grid, 
                                   scale = 1, 
                                   alpha = 1), 
         .before = 1)

wdAMSep_wt = wdAMSep_ct %>% 
  mutate(nb = st_knn(area_hexagon_grid,
                     k=8),
         wt = st_inverse_distance(nb, 
                                   area_hexagon_grid, 
                                   scale = 1, 
                                   alpha = 1), 
         .before = 1)

wdAMOct_wt = wdAMOct_ct %>% 
  mutate(nb = st_knn(area_hexagon_grid,
                     k=8),
         wt = st_inverse_distance(nb, 
                                   area_hexagon_grid, 
                                   scale = 1, 
                                   alpha = 1), 
         .before = 1)

For Weekday Afternoon Peak:

wdPMAug_wt = wdPMAug_ct %>% 
  mutate(nb = st_knn(area_hexagon_grid,
                     k=8),
         wt = st_inverse_distance(nb, 
                                   area_hexagon_grid, 
                                   scale = 1, 
                                   alpha = 1), 
         .before = 1)

wdPMSep_wt = wdPMSep_ct %>% 
  mutate(nb = st_knn(area_hexagon_grid,
                     k=8),
         wt = st_inverse_distance(nb, 
                                   area_hexagon_grid, 
                                   scale = 1, 
                                   alpha = 1), 
         .before = 1)

wdPMOct_wt = wdPMOct_ct %>% 
  mutate(nb = st_knn(area_hexagon_grid,
                     k=8),
         wt = st_inverse_distance(nb, 
                                   area_hexagon_grid, 
                                   scale = 1, 
                                   alpha = 1), 
         .before = 1)

For Weekend/Holiday Morning Peak:

weAMAug_wt = weAMAug_ct %>% 
  mutate(nb = st_knn(area_hexagon_grid,
                     k=8),
         wt = st_inverse_distance(nb, 
                                   area_hexagon_grid, 
                                   scale = 1, 
                                   alpha = 1), 
         .before = 1)

weAMSep_wt = weAMSep_ct %>% 
  mutate(nb = st_knn(area_hexagon_grid,
                     k=8),
         wt = st_inverse_distance(nb, 
                                   area_hexagon_grid, 
                                   scale = 1, 
                                   alpha = 1), 
         .before = 1)

weAMOct_wt = weAMOct_ct %>% 
  mutate(nb = st_knn(area_hexagon_grid,
                     k=8),
         wt = st_inverse_distance(nb, 
                                   area_hexagon_grid, 
                                   scale = 1, 
                                   alpha = 1), 
         .before = 1)

For Weekend/Holiday Evening Peak:

wePMAug_wt = wePMAug_ct %>% 
  mutate(nb = st_knn(area_hexagon_grid,
                     k=8),
         wt = st_inverse_distance(nb, 
                                   area_hexagon_grid, 
                                   scale = 1, 
                                   alpha = 1), 
         .before = 1)

wePMSep_wt = wePMSep_ct %>% 
  mutate(nb = st_knn(area_hexagon_grid,
                     k=8),
         wt = st_inverse_distance(nb, 
                                   area_hexagon_grid, 
                                   scale = 1, 
                                   alpha = 1), 
         .before = 1)

wePMOct_wt = wePMOct_ct %>% 
  mutate(nb = st_knn(area_hexagon_grid,
                     k=8),
         wt = st_inverse_distance(nb, 
                                   area_hexagon_grid, 
                                   scale = 1, 
                                   alpha = 1), 
         .before = 1)

5.2 Computing Local Indicators of Spatial Association (LISA) of Passenger Trips by Origin

The local_moran() function in the sfdep package is used to compute the local Moran’s I value of “TOTAL_TRIPS” at the hexagon level. The output is a tibble data frame, lisa.

  • The output is a simple feature data frame containing the columns ii, eii, var_ii, z_ii, p_ii, p_ii_sim, and p_folded_sim.
    • ii: Local moran statistic.

    • eii: Expectation of local moran statistic; for local_moran_perm(), the permutation sample means.

    • var_ii: Variance of local moran statistic; for local_moran_perm(), the permutation sample standard deviations.

    • z_ii: Standard deviation of local moran statistic; for local_moran_perm(), based on permutation of sample means and standard deviations

    • p_ii: p-value of local moran statistic using pnorm(); for local_moran_perm(), using standard deviation based on permutation of sample means and standard deviations

    • p_ii_sim: For local_moran_perm(), rank() and punif() of observed statistic rank for [0, 1] p-values using “alternative=”

    • p_folded_sim: The simulation folded [0, 0.5] range ranked p-value.

    • skewness: For local_moran_perm(), the output of e1071::skewness() for the permutation samples underlying the standard deviates

    • kurtosis: For local_moran_perm(), the output of e1071::kurtosis() for the permutation samples underlying the standard deviates.

  • The unnest() function in the tidyr package is used to expand a list-column containing data frames into rows and columns.
  • In terms of interpretation:
    • A positive local Moran’s I value indicates a hexagon close to similar values (High-High or Low-Low);
    • A negative value indicates a hexagon close to dissimilar values (High-Low or Low-High); and
    • A value near zero suggests no significant local spatial autocorrelation.

For Weekday Morning Peak:

wdAMAug_lisa = wdAMAug_wt %>%
  mutate(local_moran = local_moran(
    TOTAL_TRIPS, nb, wt, nsim = 99),
    .before = 1) %>%
  unnest(local_moran)

wdAMSep_lisa = wdAMSep_wt %>%
  mutate(local_moran = local_moran(
    TOTAL_TRIPS, nb, wt, nsim = 99),
    .before = 1) %>%
  unnest(local_moran)

wdAMOct_lisa = wdAMOct_wt %>%
  mutate(local_moran = local_moran(
    TOTAL_TRIPS, nb, wt, nsim = 99),
    .before = 1) %>%
  unnest(local_moran)

For Weekday Afternoon Peak:

wdPMAug_lisa = wdPMAug_wt %>%
  mutate(local_moran = local_moran(
    TOTAL_TRIPS, nb, wt, nsim = 99),
    .before = 1) %>%
  unnest(local_moran)

wdPMSep_lisa = wdPMSep_wt %>%
  mutate(local_moran = local_moran(
    TOTAL_TRIPS, nb, wt, nsim = 99),
    .before = 1) %>%
  unnest(local_moran)

wdPMOct_lisa = wdPMOct_wt %>%
  mutate(local_moran = local_moran(
    TOTAL_TRIPS, nb, wt, nsim = 99),
    .before = 1) %>%
  unnest(local_moran)

For Weekend/Holiday Morning Peak:

weAMAug_lisa = weAMAug_wt %>%
  mutate(local_moran = local_moran(
    TOTAL_TRIPS, nb, wt, nsim = 99),
    .before = 1) %>%
  unnest(local_moran)

weAMSep_lisa = weAMSep_wt %>%
  mutate(local_moran = local_moran(
    TOTAL_TRIPS, nb, wt, nsim = 99),
    .before = 1) %>%
  unnest(local_moran)

weAMOct_lisa = weAMOct_wt %>%
  mutate(local_moran = local_moran(
    TOTAL_TRIPS, nb, wt, nsim = 99),
    .before = 1) %>%
  unnest(local_moran)

For Weekend/Holiday Evening Peak:

wePMAug_lisa = wePMAug_wt %>%
  mutate(local_moran = local_moran(
    TOTAL_TRIPS, nb, wt, nsim = 99),
    .before = 1) %>%
  unnest(local_moran)

wePMSep_lisa = wePMSep_wt %>%
  mutate(local_moran = local_moran(
    TOTAL_TRIPS, nb, wt, nsim = 99),
    .before = 1) %>%
  unnest(local_moran)

wePMOct_lisa = wePMOct_wt %>%
  mutate(local_moran = local_moran(
    TOTAL_TRIPS, nb, wt, nsim = 99),
    .before = 1) %>%
  unnest(local_moran)

5.3 Creating Local Indicators of Spatial Association (LISA) Maps of Passenger Trips by Origin

The LISA map is a categorical map showing outliers and clusters. There are two types of outliers namely: High-Low and Low-High outliers. Likewise, there are two type of clusters namely: High-High and Low-Low clusters. The LISA map is an interpreted map combining local Moran’s I of geographical areas and their respective p-values.

For Weekday Morning Peak:

wdAMAug_lisasig = wdAMAug_lisa  %>%
  filter(p_ii_sim < 0.05)

wdAMAug_lisamap = tm_shape(wdAMAug_lisasig) +
  tm_fill(col = "mean",
    title = "Cluster/Outlier",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.5,
    popup.vars = c("Cluster/Outlier: " = "mean")) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_layout(title = "LISA Map of Weekday Morning Peak in Aug 2023")

wdAMSep_lisasig = wdAMSep_lisa  %>%
  filter(p_ii_sim < 0.05)

wdAMSep_lisamap = tm_shape(wdAMSep_lisasig) +
  tm_fill(col = "mean",
    title = "Cluster/Outlier",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.5,
    popup.vars = c("Cluster/Outlier: " = "mean")) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_layout(title = "LISA Map of Weekday Morning Peak in Sep 2023")

wdAMOct_lisasig = wdAMOct_lisa  %>%
  filter(p_ii_sim < 0.05)

wdAMOct_lisamap = tm_shape(wdAMOct_lisasig) +
  tm_fill(col = "mean",
    title = "Cluster/Outlier",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.5,
    popup.vars = c("Cluster/Outlier: " = "mean")) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_layout(title = "LISA Map of Weekday Morning Peak in Oct 2023")

For Weekday Afternoon Peak:

wdPMAug_lisasig = wdPMAug_lisa  %>%
  filter(p_ii_sim < 0.05)

wdPMAug_lisamap = tm_shape(wdPMAug_lisasig) +
  tm_fill(col = "mean",
    title = "Cluster/Outlier",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.5,
    popup.vars = c("Cluster/Outlier: " = "mean")) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_layout(title = "LISA Map of Weekday Afternoon Peak in Aug 2023")

wdPMSep_lisasig = wdPMSep_lisa  %>%
  filter(p_ii_sim < 0.05)

wdPMSep_lisamap = tm_shape(wdPMSep_lisasig) +
  tm_fill(col = "mean",
    title = "Cluster/Outlier",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.5,
    popup.vars = c("Cluster/Outlier: " = "mean")) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_layout(title = "LISA Map of Weekday Afternoon Peak in Sep 2023")

wdPMOct_lisasig = wdPMOct_lisa  %>%
  filter(p_ii_sim < 0.05)

wdPMOct_lisamap = tm_shape(wdPMOct_lisasig) +
  tm_fill(col = "mean",
    title = "Cluster/Outlier",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.5,
    popup.vars = c("Cluster/Outlier: " = "mean")) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_layout(title = "LISA Map of Weekday Afternoon Peak in Oct 2023")

For Weekend/Holiday Morning Peak:

weAMAug_lisasig = weAMAug_lisa  %>%
  filter(p_ii_sim < 0.05)

weAMAug_lisamap = tm_shape(weAMAug_lisasig) +
  tm_fill(col = "mean",
    title = "Cluster/Outlier",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.5,
    popup.vars = c("Cluster/Outlier: " = "mean")) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_layout(title = "LISA Map of Weekend/Holiday Morning Peak in Aug 2023")

weAMSep_lisasig = weAMSep_lisa  %>%
  filter(p_ii_sim < 0.05)

weAMSep_lisamap = tm_shape(weAMSep_lisasig) +
  tm_fill(col = "mean",
    title = "Cluster/Outlier",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.5,
    popup.vars = c("Cluster/Outlier: " = "mean")) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_layout(title = "LISA Map of Weekend/Holiday Morning Peak in Sep 2023")

weAMOct_lisasig = weAMOct_lisa  %>%
  filter(p_ii_sim < 0.05)

weAMOct_lisamap = tm_shape(weAMOct_lisasig) +
  tm_fill(col = "mean",
    title = "Cluster/Outlier",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.5,
    popup.vars = c("Cluster/Outlier: " = "mean")) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_layout(title = "LISA Map of Weekend/Holiday Morning Peak in Oct 2023")

For Weekend/Holiday Evening Peak:

wePMAug_lisasig = wePMAug_lisa  %>%
  filter(p_ii_sim < 0.05)

wePMAug_lisamap = tm_shape(wePMAug_lisasig) +
  tm_fill(col = "mean",
    title = "Cluster/Outlier",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.5,
    popup.vars = c("Cluster/Outlier: " = "mean")) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_layout(title = "LISA Map of Weekend/Holiday Evening Peak in Aug 2023")

wePMSep_lisasig = wePMSep_lisa  %>%
  filter(p_ii_sim < 0.05)

wePMSep_lisamap = tm_shape(wePMSep_lisasig) +
  tm_fill(col = "mean",
    title = "Cluster/Outlier",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.5,
    popup.vars = c("Cluster/Outlier: " = "mean")) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_layout(title = "LISA Map of Weekend/Holiday Evening Peak in Sep 2023")

wePMOct_lisasig = wePMOct_lisa  %>%
  filter(p_ii_sim < 0.05)

wePMOct_lisamap = tm_shape(wePMOct_lisasig) +
  tm_fill(col = "mean",
    title = "Cluster/Outlier",
    id = "grid_id",
    showNA = FALSE,
    alpha = 0.5,
    popup.vars = c("Cluster/Outlier: " = "mean")) +
  tm_borders(col = "grey40", lwd = 0.7) +
  tm_layout(title = "LISA Map of Weekend/Holiday Evening Peak in Oct 2023")

5.4 Comparison Between Different Peak Periods

5.4.1 August 2023

Weekday AM versus Weekday PM:

tmap_arrange(wdAMAug_lisamap, wdPMAug_lisamap, ncol = 2)

Weekend/Holiday AM versus Weekend/Holiday PM:

tmap_arrange(weAMAug_lisamap, wePMAug_lisamap, ncol = 2)

5.4.2 September 2023

Weekday AM versus Weekday PM:

tmap_arrange(wdAMSep_lisamap, wdPMSep_lisamap, ncol = 2)

Weekend/Holiday AM versus Weekend/Holiday PM:

tmap_arrange(weAMSep_lisamap, wePMSep_lisamap, ncol = 2)

5.4.3 October 2023

Weekday AM versus Weekday PM:

tmap_arrange(wdAMOct_lisamap, wdPMOct_lisamap, ncol = 2)

Weekend/Holiday AM versus Weekend/Holiday PM:

tmap_arrange(weAMOct_lisamap, wePMOct_lisamap, ncol = 2)

Drawing Statistical Conclusions for Analysis Results (200 words per visual)

Extra: 6 Emerging Hot Spot Analysis (EHSA)